21 research outputs found
Object-based visual attention for computer vision
AbstractIn this paper, a novel model of object-based visual attention extending Duncan's Integrated Competition Hypothesis [Phil. Trans. R. Soc. London B 353 (1998) 1307β1317] is presented. In contrast to the attention mechanisms used in most previous machine vision systems which drive attention based on the spatial location hypothesis, the mechanisms which direct visual attention in our system are object-driven as well as feature-driven. The competition to gain visual attention occurs not only within an object but also between objects. For this purpose, two new mechanisms in the proposed model are described and analyzed in detail. The first mechanism computes the visual salience of objects and groupings; the second one implements the hierarchical selectivity of attentional shifts. The results of the new approach on synthetic and natural images are reported
Structure of Peer-to-Peer Social Networks
This paper presents a statistical analysis of the structure of Peer-to-Peer
(P2P) social networks that captures social associations of distributed peers in
resource sharing. Peer social networks appear to be mainly composed of pure
resource providers that guarantee high resource availability and reliability of
P2P systems. The major peers that both provide and request resources are only a
small fraction. The connectivity between peers, including undirected, directed
(out and in) and weighted connections, is scale-free and the social networks of
all peers and major peers are small world networks. The analysis also confirms
that peer social networks show in general disassortative correlations, except
that active providers are connected between each other and by active
requesters. The study presented in this paper gives a better understanding of
peer relationships in resource sharing, which may help a better design of
future P2P networks and open the path to the study of transport processes on
top of real P2P topologies.Comment: APS Style, 8 pages, 5 figures and 4 tables. Final versio
A Natural Hand Gesture System for People with Brachial Plexus Injuries
This paper focuses on a design case study of a natural hand gesture system for users with intact motion control of the metacarpophalangeal joint and thumb basal joint of the hand after brachial plexus injuries. The lexicon of hand gestures had eight entries and was demonstrated to be natural and ergonomic with the limited hand motions. A cooperative multi-cue system was proposed for the key hand posture recognition of the proposed hand gestures. We utilized the designed system into a remote smart car control and electric wheelchair control. Experimental a results demonstrated the robustness and potential feasibility of the system in human-computer interaction for the proposed users
Hierarchical Object-Based Visual Attention for Machine Vision
Institute of Perception, Action and BehaviourHuman vision uses mechanisms of covert attention to selectively process interesting
information and overt eye movements to extend this selectivity ability. Thus, visual
tasks can be effectively dealt with by limited processing resources. Modelling visual
attention for machine vision systems is not only critical but also challenging. In the
machine vision literature there have been many conventional attention models developed
but they are all space-based only and cannot perform object-based selection. In
consequence, they fail to work in real-world visual environments due to the intrinsic
limitations of the space-based attention theory upon which these models are built.
The aim of the work presented in this thesis is to provide a novel human-like visual
selection framework based on the object-based attention theory recently being developed
in psychophysics. The proposed solution β a Hierarchical Object-based Attention
Framework (HOAF) based on grouping competition, consists of two closely-coupled
visual selection models of (1) hierarchical object-based visual (covert) attention and
(2) object-based attention-driven (overt) saccadic eye movements. The Hierarchical
Object-based Attention Model (HOAM) is the primary selection mechanism and the
Object-based Attention-Driven Saccading model (OADS) has a supporting role, both
of which are combined in the integrated visual selection framework HOAF.
This thesis first describes the proposed object-based attention model HOAM which
is the primary component of the selection framework HOAF. The model is based on
recent psychophysical results on object-based visual attention and adopted grouping-based
competition to integrate object-based and space-based attention together so as
to achieve object-based hierarchical selectivity. The behaviour of the model is demonstrated
on a number of synthetic images simulating psychophysical experiments and
real-world natural scenes. The experimental results showed that the performance of
our object-based attention model HOAM concurs with the main findings in the psychophysical
literature on object-based and space-based visual attention. Moreover,
HOAM has outstanding hierarchical selectivity from far to near and from coarse to fine
by features, objects, spatial regions, and their groupings in complex natural scenes.
This successful performance arises from three original mechanisms in the model:
grouping-based saliency evaluation, integrated competition between groupings, and
hierarchical selectivity. The model is the first implemented machine vision model of
integrated object-based and space-based visual attention.
The thesis then addresses another proposed model of Object-based Attention-Driven
Saccadic eye movements (OADS) built upon the object-based attention model HOAM,
ii
as an overt saccading component within the object-based selection framework HOAF.
This model, like our object-based attention model HOAM, is also the first implemented
machine vision saccading model which makes a clear distinction between (covert) visual
attention and overt saccading movements in a two-level selection system β an
important feature of human vision but not yet explored in conventional machine vision
saccading systems. In the saccading model OADS, a log-polar retina-like sensor
is employed to simulate the human-like foveation imaging for space variant sensing.
Through a novel mechanism for attention-driven orienting, the sensor fixates on
new destinations determined by object-based attention. Hence it helps attention to
selectively process interesting objects located at the periphery of the whole field of
view to accomplish the large-scale visual selection tasks. By another proposed novel
mechanism for temporary inhibition of return, OADS can simulate the human saccading/
attention behaviour to refixate/reattend interesting objects for further detailed
inspection.
This thesis concludes that the proposed human-like visual selection solution β
HOAF, which is inspired by psychophysical object-based attention theory and grouping-based
competition, is particularly useful for machine vision. HOAF is a general and
effective visual selection framework integrating object-based attention and attentiondriven
saccadic eye movements with biological plausibility and object-based hierarchical
selectivity from coarse to fine in a space-time context
Task-oriented Memory-efficient Pruning-Adapter
The Outstanding performance and growing size of Large Language Models has led
to increased attention in parameter efficient learning. The two predominant
approaches are Adapters and Pruning. Adapters are to freeze the model and give
it a new weight matrix on the side, which can significantly reduce the time and
memory of training, but the cost is that the evaluation and testing will
increase the time and memory consumption. Pruning is to cut off some weight and
re-distribute the remaining weight, which sacrifices the complexity of training
at the cost of extremely high memory and training time, making the cost of
evaluation and testing relatively low. So efficiency of training and inference
can't be obtained in the same time. In this work, we propose a task-oriented
Pruning-Adapter method that achieve a high memory efficiency of training and
memory, and speeds up training time and ensures no significant decrease in
accuracy in GLUE tasks, achieving training and inference efficiency at the same
time
Guided Cooperation in Hierarchical Reinforcement Learning via Model-based Rollout
Goal-conditioned hierarchical reinforcement learning (HRL) presents a
promising approach for enabling effective exploration in complex long-horizon
reinforcement learning (RL) tasks via temporal abstraction. Yet, most
goal-conditioned HRL algorithms focused on the subgoal discovery, regardless of
inter-level coupling. In essence, for hierarchical systems, the increased
inter-level communication and coordination can induce more stable and robust
policy improvement. Here, we present a goal-conditioned HRL framework with
Guided Cooperation via Model-based Rollout (GCMR), which estimates forward
dynamics to promote inter-level cooperation. The GCMR alleviates the
state-transition error within off-policy correction through a model-based
rollout, further improving the sample efficiency. Meanwhile, to avoid being
disrupted by these corrected but possibly unseen or faraway goals, lower-level
Q-function gradients are constrained using a gradient penalty with a
model-inferred upper bound, leading to a more stable behavioral policy.
Besides, we propose a one-step rollout-based planning to further facilitate
inter-level cooperation, where the higher-level Q-function is used to guide the
lower-level policy by estimating the value of future states so that global task
information is transmitted downwards to avoid local pitfalls. Experimental
results demonstrate that incorporating the proposed GCMR framework with ACLG, a
disentangled variant of HIGL, yields more stable and robust policy improvement
than baselines and substantially outperforms previous state-of-the-art (SOTA)
HRL algorithms in both hard-exploration problems and robotic control
Multiscale Superpixel Structured Difference Graph Convolutional Network for VL Representation
Within the multimodal field, the key to integrating vision and language lies
in establishing a good alignment strategy. Recently, benefiting from the
success of self-supervised learning, significant progress has been made in
multimodal semantic representation based on pre-trained models for vision and
language. However, there is still room for improvement in visual semantic
representation. The lack of spatial semantic coherence and vulnerability to
noise makes it challenging for current pixel or patch-based methods to
accurately extract complex scene boundaries. To this end, this paper develops
superpixel as a comprehensive compact representation of learnable image data,
which effectively reduces the number of visual primitives for subsequent
processing by clustering perceptually similar pixels. To mine more precise
topological relations, we propose a Multiscale Difference Graph Convolutional
Network (MDGCN). It parses the entire image as a fine-to-coarse hierarchical
structure of constituent visual patterns, and captures multiscale features by
progressively merging adjacent superpixels as graph nodes. Moreover, we predict
the differences between adjacent nodes through the graph structure,
facilitating key information aggregation of graph nodes to reason actual
semantic relations. Afterward, we design a multi-level fusion rule in a
bottom-up manner to avoid understanding deviation by learning complementary
spatial information at different regional scales. Our proposed method can be
well applied to multiple downstream task learning. Extensive experiments
demonstrate that our method is competitive with other state-of-the-art methods
in visual reasoning. Our code will be released upon publication
A computer vision model for visual-object-based attention and eye movements
This is the post-print version of the final paper published in Computer Vision and Image Understanding. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2008 Elsevier B.V.This paper presents a new computational framework for modelling visual-object-based attention and attention-driven eye movements within an integrated system in a biologically inspired approach. Attention operates at multiple levels of visual selection by space, feature, object and group depending on the nature of targets and visual tasks. Attentional shifts and gaze shifts are constructed upon their common process circuits and control mechanisms but also separated from their different function roles, working together to fulfil flexible visual selection tasks in complicated visual environments. The framework integrates the important aspects of human visual attention and eye movements resulting in sophisticated performance in complicated natural scenes. The proposed approach aims at exploring a useful visual selection system for computer vision, especially for usage in cluttered natural visual environments.National Natural Science of Founda-
tion of Chin
The pairwise phase consistency in cortical network and its relationship with neuronal activation
Gamma-band neuronal oscillation and synchronization with the range of 30-90 Hz are ubiquitous phenomenon across numerous brain areas and various species, and correlated with plenty of cognitive functions. The phase of the oscillation, as one aspect of CTC (Communication through Coherence) hypothesis, underlies various functions for feature coding, memory processing and behaviour performing. The PPC (Pairwise Phase Consistency), an improved coherence measure, statistically quantifies the strength of phase synchronization. In order to evaluate the PPC and its relationships with input stimulus, neuronal activation and firing rate, a simplified spiking neuronal network is constructed to simulate orientation columns in primary visual cortex. If the input orientation stimulus is preferred for a certain orientation column, neurons within this corresponding column will obtain higher firing rate and stronger neuronal activation, which consequently engender higher PPC values, with higher PPC corresponding to higher firing rate. In addition, we investigate the PPC in time resolved analysis with a sliding window
The pairwise phase consistency in cortical network and its relationship with neuronal activation
Gamma-band neuronal oscillation and synchronization with the range of 30-90 Hz are ubiquitous phenomenon across numerous brain areas and various species, and correlated with plenty of cognitive functions. The phase of the oscillation, as one aspect of CTC (Communication through Coherence) hypothesis, underlies various functions for feature coding, memory processing and behaviour performing. The PPC (Pairwise Phase Consistency), an improved coherence measure, statistically quantifies the strength of phase synchronization. In order to evaluate the PPC and its relationships with input stimulus, neuronal activation and firing rate, a simplified spiking neuronal network is constructed to simulate orientation columns in primary visual cortex. If the input orientation stimulus is preferred for a certain orientation column, neurons within this corresponding column will obtain higher firing rate and stronger neuronal activation, which consequently engender higher PPC values, with higher PPC corresponding to higher firing rate. In addition, we investigate the PPC in time resolved analysis with a sliding window